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1 . (Ciirrently Amended) A method of automatically creating a dictionary for clustering 
documents comprisir^: 

inputting a maximum dictionary size: 

determining a jfrequency of each word in each of said documents; 

creating a Hashtablo dictionary of most fiequently occurring words in said documents as 
limited by said maximum dictionary size : 

determining a frequency of phrases in each of said documents that contain only words in 
said Hashtablo dictionary: 

adding most frequently occurring phrases to said Hoshtablc dictionary : and 

outputting said most frequently occurring words and said most frequently occurring 
phrases as said dictionary. 

2, (Currently Amended) The method in claim 1, wherein said determining a frequency of 
each word comprises: 

removing punctuation and case from said documents; 
removing stop words from said document; 
replacing words in said documents with synonyms; 
removing duplicate words from said dociunents; 

adding remaining words to said Hashtablo dictionarv as limited by said maximum 
dictionary y^iVe; 

determining said frequency of each word remaining in said Hoshtablc dictionarv: and 
removing words below a frequency level iftom said Hashtablo dictionarv . 

3, (Original) The method in claim 2, further comprising inputting one or more of said stop 
words, said synonyms, and said frequency level. 

4, (Currently Amended) The method in claim 1 , wherein said determining a frequency of 
phrases comprises: 
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removing punctuation aad case from said documents; 
removmg stop words from said document; 
replacing words in said documents with synonyms; 

adding said phrases in each of said documents that contain only words in said Hoahtablc 
dictionary to said Hashtablo dictionary : 

determining said frequency of said phrases remaining in said Haahtable dictionary : and 
removing phrases below a frequency level from said Hoiihtablc dictionary . 

5. (Original) The method in claim 4, fmther comprising inputting one or more of said stop 
words, said synonyms, and said frequency level. 

6. (Currently Amended) A method of automatically creating a dictionary for clustering text 
documents comprising: 

inputting a maximum dictionary size; 

performing a first pass for each of said documents comprising: 

determining a frequency of each word in each of said documents; and 
creating a Haahtablo dictionary of most frequently occurring words in said 
document s as limited bv said maximum dictionary size; 

perfonning a serond pass for each of said documents comprising: 

determining a frequency of phrases in each of said documents that contain only 
words in said Haohtablo dictionary: and 

adding most frequently occurring phrases to said Haahtablo dictionary: and 
outpxrtting said most frequently occurring words and said most frequently occurring 
phrases as said Haohtablo dictionary . 

7. (Currently Amended) The method in clahn 6, wherein said determining a frequency of 
each word comprises: 

removing punctuation and case &om said documents; 
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removing stop words from said document; 
replacing words in said documents with synonyms; 
removing duplicate words j&om said documents; 

adding remainiii^ words to said Hoahtablc dictionary as limited by said maximum 
djctionarv size : 

determining said frequency of each word remaining in said Hashtablo dictionary : and 
removing words below a frequency level from said H a s htabl e dictionary . 

8. (Original) The method in claim 7, further comprising inputting one or more of said stop 
words, said synonyms, and said frequency level. 

9. (Currently Amended) The method in claim 6, wherein said determining a fluency of 
phrases comprises: 

removing pimctuation and case from said documents; 
removing stop words fi^m said document; 
replacing words in said documents with synonyms; 

adding said phrases in each of said documents that contain only words in said Bashtable 
dictionary to said Hashtablo dictionary: 

determining said frequency of said phrases remaining in said Hashtab te diction^ ; and 
removing phrases below a frequency level from said Hashtable dictionary . 

10. (Original) The method in claim 9, fijrther comprising inputting one or more of said stop 
words, said synonyms, and said frequency level. 

1 1 . (Currently Amended) A program storage device readable by machine, tangibly 
embodying a program of instructions executable by the machine to perform a method of 
automatically creating a dictionary for clustering text documents, said method comprising: 

inputting a maximum dictionary si7:e; 
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detennining a frequency of each word in each of said documents; 

creating a Hashtabl e dictionary of most frequently occurring words in said documents as 
limited bv said maximum dictionary size: 

determining a frequency of phrases in each of said documents that contain only words in 
said Hoisfatablo dictionary : 

adding most frequently occurring phrases to said Hashtable dictionary: and 

outputting said most frequently occxiiiing words and said most fr^equentiy occurring 
phrases as said dictionary. 

12. (Currently Amended) A program storage device as in claim 1 1, wherein said detenrdning 
a frequency of each word comprises: 

removing pimctuation and case from said documents; 
removing stop words from said document; . 
replacing words in said documents with synonyms; 
removing duplicate words from sedd documents; 
adding remaining words to said Hoahtablo dictionary: 

determining said fi^uency of each word remaining in said Hashtable dictionary : and 
removing words below a frequency level .from said H ashtabl e dictionary . 

13. (Original) A program storage device as in claim 1 2, further comprising inputting one or 
more of said stop words, said synonyms, and said frequency level. 

14. (Correntiy Amended) A program storage device as in claim U, wherein said determining 
a frequency of phrases comprises: 

removing punctuation and case from said documents; 
removing stop words frxMn said document; 
replacing words in said documents with synonyms; 

adding said phrases in each of $aid documents that contain only words in said Hoshtabk) 
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dictionary to said H a sb fea b te dictioiiarv: 

deteiroiniog said frequency of said phrases remaicung .in said Hajshtabl e dictionaxv : and 
removing phrases below a frequency level from said Hashtable dictionary . 




\ 



15- (Original) A program storage device as in claim 14, further comprising inputting said stop 
words, 

16. (Origiaal) A program storage device as in claim 14, ftirther comprising inputting said 
synonyms. 



17. (Original) A program storage device as in claim 14, fiirther comprising inputting said 
frequency level. 
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